NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FSP: Towards Flexible Synchronous Parallel Frameworks for Distributed Machine Learning

https://doi.org/10.1109/TPDS.2022.3228733

Wang, Zhigang; Tu, Yilei; Wang, Ning; Gao, Lixin; Nie, Jie; Wei, Zhiqiang; Gu, Yu; Yu, Ge (February 2023, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
NeutronStar: Distributed GNN Training with Hybrid Dependency Management

https://doi.org/10.1145/3514221.3526134

Wang, Qiange; Zhang, Yanfeng; Wang, Hao; Chen, Chaoyi; Zhang, Xiaodong; Yu, Ge (June 2022, ACM SIGMOD Conference on Data Management)

Full Text Available
Self-assembled aluminum oxyhydroxide nanorices with superior suspension stability for vaccine adjuvant

https://doi.org/10.1016/j.jcis.2022.07.022

Bi, Shisheng; Li, Min; Liang, Zhihui; Li, Guangle; Yu, Ge; Zhang, Jiarui; Chen, Chen; Yang, Cheng; Xue, Changying; Zuo, Yi Y.; et al (December 2022, Journal of Colloid and Interface Science)

Full Text Available
MIDIA: exploring denoising autoencoders for missing data imputation

https://doi.org/10.1007/s10618-020-00706-8

Ma, Qian; Lee, Wang-Chien; Fu, Tao-Yang; Gu, Yu; Yu, Ge (November 2020, Data Mining and Knowledge Discovery)

Full Text Available
Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing

Wang, Qiange; Zhang, Yanfeng; Wang, Hao; Geng, Liang; Lee, Rubao; Zhang, Xiaodong; Yu, Ge (June 2020, Proceedings of ACM SIGMOD Conference on Management of Data)
null (Ed.)
In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental compuping and executed synchronously and/or asynchronously. We identify three barriers in existing recursive aggregate data processing. First, the processing scope is largely limited to monotonic programs. Second, checking on conditions for monotonicity and correctness for async processing is sophisticated and manually done. Third, execution engines may be suboptimal due to separation of sync and async execution.In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is mono-tonic or even non-monotonic can be executed incrementally and asynchronously with its correct result. We design and implement a condition verification tool that can automatically check if a given program satisfies the conditions. We further propose a unified sync-async engine to execute these pro-grams for high performance. To integrate all these effective methods together, we have developed a distributed Datalog system, called PowerLog. Our evaluation shows that PowerLog can outperform three representative Datalog systems on both monotonic and non-monotonic recursive programs.
more » « less
Full Text Available
Learning Embeddings of Intersections on Road Networks

https://doi.org/10.1145/3347146.3359075

Wang, Meng-xiang; Lee, Wang-Chien; Fu, Tao-yang; Yu, Ge (November 2019, Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems)

Road network is a basic component of intelligent transportation systems (ITS) in smart city. Informative representation of road networks is important as it is essential to a wide variety of ITS applications. In this paper, we propose a neural network representation learning model, namely Intersection of Road Network to Vector (IRN2Vec), to learn embeddings of road intersections that encode rich information in a road network by exploring geo-locality and intrinsic properties of intersections and moving behaviors of road users. In addition to model design, several issues unique to IRN2Vec, including data preparation for model training and various relationships among intersections, are examined. We evaluate the learned embeddings via extensive experiments on three real-world datasets using three downstream test cases, including prediction of traffic signals and crossings on intersections and travel time estimation. Experimental results show that the proposed IRN2Vec outperforms three existing methods, DeepWalk, LINE and Node2vec, in terms of F1-score in predicting traffic signals (22.21% to 23.84%) and crossings (8.65% to 11.65%), and mean absolute error (MAE) in travel time estimation (9.87% to 19.28%).
more » « less
Full Text Available
REMIAN: Real-Time and Error-Tolerant Missing Value Imputation

https://doi.org/10.1145/3412364

Ma, Qian; Gu, Yu; Lee, Wang-Chien; Yu, Ge; Liu, Hongbo; Wu, Xindong (October 2020, ACM Transactions on Knowledge Discovery from Data)

Missing value (MV) imputation is a critical preprocessing means for data mining. Nevertheless, existing MV imputation methods are mostly designed for batch processing, and thus are not applicable to streaming data, especially those with poor quality. In this article, we propose a framework, called Real-time and Error-tolerant Missing vAlue ImputatioN (REMAIN), to impute MVs in poor-quality streaming data. Instead of imputing MVs based on all the observed data, REMAIN first initializes the MV imputation model based on a-RANSAC which is capable of detecting and rejecting anomalies in an efficient manner, and then incrementally updates the model parameters upon the arrival of new data to support real-time MV imputation. As the correlations among attributes of the data may change over time in unforseenable ways, we devise a deterioration detection mechanism to capture the deterioration of the imputation model to further improve the imputation accuracy. Finally, we conduct an extensive evaluation on the proposed algorithms using real-world and synthetic datasets. Experimental results demonstrate that REMAIN achieves significantly higher imputation accuracy over existing solutions. Meanwhile, REMAIN improves up to one order of magnitude in time cost compared with existing approaches.
more » « less
Full Text Available
On Representation Learning for Road Networks

https://doi.org/10.1145/3424346

Wang, Meng-Xiang; Lee, Wang-Chien; Fu, Tao-Yang; Yu, Ge (February 2021, ACM Transactions on Intelligent Systems and Technology)

Informative representation of road networks is essential to a wide variety of applications on intelligent transportation systems. In this article, we design a new learning framework, called Representation Learning for Road Networks (RLRN), which explores various intrinsic properties of road networks to learn embeddings of intersections and road segments in road networks. To implement the RLRN framework, we propose a new neural network model, namely Road Network to Vector (RN2Vec), to learn embeddings of intersections and road segments jointly by exploring geo-locality and homogeneity of them, topological structure of the road networks, and moving behaviors of road users. In addition to model design, issues involving data preparation for model training are examined. We evaluate the learned embeddings via extensive experiments on several real-world datasets using different downstream test cases, including node/edge classification and travel time estimation. Experimental results show that the proposed RN2Vec robustly outperforms existing methods, including (i) Feature-based methods : raw features and principal components analysis (PCA); (ii) Network embedding methods : DeepWalk, LINE, and Node2vec; and (iii) Features + Network structure-based methods : network embeddings and PCA, graph convolutional networks, and graph attention networks. RN2Vec significantly outperforms all of them in terms of F1-score in classifying traffic signals (11.96% to 16.86%) and crossings (11.36% to 16.67%) on intersections and in classifying avenue (10.56% to 15.43%) and street (11.54% to 16.07%) on road segments, as well as in terms of Mean Absolute Error in travel time estimation (17.01% to 23.58%).
more » « less
Full Text Available

Search for: All records